We wanted to explore data on Early Modern witchcraft trials. The peak of witchcraft trials was in the Early Modern period, typically in civil courts. While each case is unique, common threads among them include accused implicating others, torture to get confessions, and disproportionaly targeting women weak ties to the community. We wanted to take a data driven approach to understanding the witchcraft trials better.
To do this we rely on two datasets. The first was assembled by Jake Russ and is a collection of trials and battles across Europe with summary information. The second dataset is an indepth collection of data on Scottish witchcraft trials from Julian Goodare et al. This includes fewer trials, but has more detailed information such as the names of the accused, torture methods involved, and religious or supernatural motifs.
Could the century or decade be a factor in the amount of witch trials?
Everyone was involved in developing research questions and we all presented the work we did to each other and solicited input. Kyle explored clustering the geographic data and exploring the data by country. Anand worked on the visualizations for the geographic data by categories. Calvin worked on cleaning the Scottish data and visualizing relations and assembling Part 1 of the report. Anand and Calvin worked on creating more vizualizations for the urbanization vs trials by creating another cluster graph as well as a scatterplot for urbanization vs trials. We answered our research question regarding "How did population and urbanization affect witchcraft trails?". Anand assembled part 2 of the report. For part 3 Kyle created the witchcraft trials interactive map at the end of the witchcraft trials geographic data section. Calvin worked on answering the fourth research question. Anand worked on answering the fifth research question. Anand assembled part 3 of the report.
This data was relatively clean already. We used the panel_dataset.csv file for processing.
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
The trials are broken into decade and country. Other columns of interest are population, urbanization, weather, temperature, real.wage, and taxes.percap.
witchTrialsData = pd.read_csv("panel_dataset.csv")
witchTrialsData
| country | decade | century | trials | battles | population | trials.mil | battles.mil | ln.trials | ln.trials.mil | ... | battles.tp2 | battles.tp3 | battles.mil.tp1 | battles.mil.tp2 | battles.mil.tp3 | weather | urbanization | real.wage | taxes.percap | temperature | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Austria | 1300.0 | 1300.0 | 0.0 | 0.0 | 2000000.0 | 0.000 | 0.0 | NaN | NaN | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 1.000 | NaN | NaN | NaN |
| 1 | Austria | 1310.0 | 1300.0 | 1.0 | 0.0 | 1925000.0 | 0.519 | 0.0 | 0.000000 | -0.655851 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 1.000 | NaN | NaN | NaN |
| 2 | Austria | 1320.0 | 1300.0 | 3.0 | 0.0 | 1850000.0 | 1.622 | 0.0 | 1.098612 | 0.483660 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 1.000 | NaN | NaN | NaN |
| 3 | Austria | 1330.0 | 1300.0 | 1.0 | 0.0 | 1775000.0 | 0.563 | 0.0 | 0.000000 | -0.574476 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 1.000 | NaN | NaN | NaN |
| 4 | Austria | 1340.0 | 1300.0 | 1.0 | 0.0 | 1700000.0 | 0.588 | 0.0 | 0.000000 | -0.531028 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 1.000 | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1171 | Switzerland | 1810.0 | 1800.0 | 0.0 | 0.0 | 1900000.0 | 0.000 | 0.0 | NaN | NaN | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 5.714 | NaN | NaN | NaN |
| 1172 | Switzerland | 1820.0 | 1800.0 | 0.0 | 0.0 | 2050000.0 | 0.000 | 0.0 | NaN | NaN | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | NaN | 5.714 | NaN | NaN | NaN |
| 1173 | Switzerland | 1830.0 | 1800.0 | 0.0 | 0.0 | 2200000.0 | 0.000 | 0.0 | NaN | NaN | ... | 0.0 | NaN | 0.0 | 0.0 | NaN | NaN | 5.714 | NaN | NaN | NaN |
| 1174 | Switzerland | 1840.0 | 1800.0 | 0.0 | 0.0 | 2350000.0 | 0.000 | 0.0 | NaN | NaN | ... | NaN | NaN | 0.0 | NaN | NaN | NaN | 5.714 | NaN | NaN | NaN |
| 1175 | Switzerland | 1850.0 | 1800.0 | 0.0 | 0.0 | 2500000.0 | 0.000 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | 5.714 | NaN | NaN | NaN |
1176 rows × 23 columns
We can see the peak of witchcraft trials is in the 17th century with a steep drop off, but more gradual lead up.
witchTrialsData.plot.scatter(x="century",y="trials", figsize=(12,7))
<AxesSubplot:xlabel='century', ylabel='trials'>
witchTrialsData.groupby('century').sum().reset_index().plot.bar(x="century",y="trials", figsize=(12,7))
<AxesSubplot:xlabel='century'>
witchTrialsData.plot.scatter(x="decade",y="trials", figsize=(12,7))
<AxesSubplot:xlabel='decade', ylabel='trials'>
witchTrialsData.groupby('decade').sum().reset_index().plot.bar(x="decade",y="trials", figsize=(12,7))
<AxesSubplot:xlabel='decade'>
The countries with the highest peak decades are Germany, Switzerland, Scotland, and Spain. Spain is an interesting outlier here as after an initial burst of trials, Spanish inquisitors essentially forbade witchcraft investigations by requring significant oversight and beauracratic hurdles. The other 3 countries also had very high total trials.
ax = witchTrialsData.plot.scatter(x="country",y="trials", figsize=(12,7))
plt.xticks(rotation = 90);
ax = witchTrialsData.groupby('country').sum().reset_index().plot.bar(x="country",y="trials", figsize=(12,7))
plt.xticks(rotation = 90);
When looking at the relation between population and data we get some unusual peaks. These could be specific centers of witchcraft trials or there might be some other relation here.
witchTrialsData.plot.scatter(x="population",y="trials", figsize=(12,7))
<AxesSubplot:xlabel='population', ylabel='trials'>
witchTrialsData.plot.scatter(x="century",y="urbanization")
<AxesSubplot:xlabel='century', ylabel='urbanization'>
As we can see, urbanization, started to pick up in the 16th and 17th centuries and in another graph we saw that the witchcraft trials also started to pick up around the 16th and 17th centuries so this could be some evidence that urbanization might have played a role in witchcraft trials.
import numpy as np
witch = witchTrialsData[['century','urbanization','trials']].dropna()
#witch.urbanization.value_counts()
np.polyfit(witch.urbanization, witch.trials, 1)
witch.urbanization.corr(witch.trials)
-0.008886409759129212
import sklearn.cluster as cl
normWitch = (witch - witch.mean()) / witch.std()
km = cl.KMeans(n_clusters=4)
labels = km.fit_predict(normWitch[['urbanization','trials']])
normWitch['CID'] = labels
witchTrialsData['CID']=normWitch['CID']
urb = witchTrialsData.plot.scatter(x="urbanization",y="trials", c = 'CID', cmap = 'rainbow', sharex = False, figsize = (12,7))
We have four different clusters where we have two clusters of low trial counts and one for low urbanization and one with high urbanization and then another cluster, which was the most intriguing, where we had some urbanization and higher trial counts and then a last cluster where we had outliers at the very top where there were very high trial count with some urbanization. Above we did the correlation coefficient and found that urbanization and trials weren't correlated but there is some relation present in our clustering.
import sklearn.cluster as cl
# new dataframe with the columns we want to utilize
panel_ds = witchTrialsData
panel_cleaned = panel_ds[['country', 'trials', 'battles', 'population', 'urbanization', 'real.wage']]
# grouping by country in order to get the sum of all the columns from 1300-1910
panel_gb = panel_cleaned.groupby('country').sum()
panelDF = panel_gb.reset_index()
abbrevs = ['AT', 'BE', 'CZ', 'DK', 'EN', 'EE', 'FI', 'FR', 'DE', 'HU', 'IE', 'IT', 'LU', 'NL', 'NIR', 'NO', 'PL', 'SCT', 'ES', 'SE', 'CH']
panelDF.insert(1, column='abbrev', value=abbrevs)
panelDF
| country | abbrev | trials | battles | population | urbanization | real.wage | |
|---|---|---|---|---|---|---|---|
| 0 | Austria | AT | 83.0 | 5.0 | 1.295000e+08 | 230.600 | 274.269323 |
| 1 | Belgium | BE | 887.0 | 25.0 | 9.185425e+07 | 1525.334 | 535.199068 |
| 2 | Czech Republic | CZ | 2.0 | 9.0 | 1.482342e+08 | 265.596 | 0.000000 |
| 3 | Denmark | DK | 90.0 | 2.0 | 4.205000e+07 | 260.860 | 0.000000 |
| 4 | England | EN | 1197.0 | 65.0 | 2.726001e+08 | 566.168 | 520.948463 |
| 5 | Estonia | EE | 205.0 | 0.0 | 1.252250e+07 | 0.000 | 0.000000 |
| 6 | Finland | FI | 710.0 | 0.0 | 1.887500e+07 | 8.250 | 0.000000 |
| 7 | France | FR | 4159.0 | 64.0 | 1.075500e+09 | 452.528 | 326.752922 |
| 8 | Germany | DE | 16474.0 | 104.0 | 6.810000e+08 | 361.602 | 248.966022 |
| 9 | Hungary | HU | 1644.0 | 0.0 | 9.662500e+07 | 177.670 | 0.000000 |
| 10 | Ireland | IE | 52.0 | 22.0 | 9.176650e+07 | 202.280 | 0.000000 |
| 11 | Italy | IT | 604.0 | 15.0 | 6.970000e+08 | 977.300 | 407.876052 |
| 12 | Luxembourg | LU | 219.0 | 0.0 | 7.770750e+06 | 23.670 | 0.000000 |
| 13 | Netherlands | NL | 369.0 | 54.0 | 8.027500e+07 | 1250.120 | 327.578474 |
| 14 | Northern Ireland | NIR | 9.0 | 6.0 | 1.813350e+07 | 40.176 | 0.000000 |
| 15 | Norway | NO | 863.0 | 0.0 | 3.167500e+07 | 96.338 | 0.000000 |
| 16 | Poland | PL | 12.0 | 3.0 | 3.137500e+08 | 173.698 | 397.582514 |
| 17 | Scotland | SCT | 3563.0 | 32.0 | 4.675000e+07 | 344.750 | 0.000000 |
| 18 | Spain | ES | 1949.0 | 8.0 | 4.575000e+08 | 764.030 | 357.571895 |
| 19 | Sweden | SE | 353.0 | 0.0 | 7.340000e+07 | 82.340 | 0.000000 |
| 20 | Switzerland | CH | 9796.0 | 4.0 | 6.152500e+07 | 189.084 | 0.000000 |
We looked specifically at trials, battles, population, urbanization, and real.wage to find clusters among countries. The elbow plot shows a strong clustering around 5 using kmeans.
panelDF_norm = panelDF.copy()
panelDF_norm.index = panelDF_norm.country
panelDF_norm = panelDF_norm.iloc[:,2:]
panelDF_norm = (panelDF_norm - panelDF_norm.mean())/panelDF_norm.std()
# Elbow plot
elbow_data = []
for i in range(2, 22):
km = cl.KMeans(n_clusters=i)
labels = km.fit_predict(panelDF_norm)
elbow_data.append((i, km.inertia_))
elbowDF = pd.DataFrame(elbow_data, columns=['k', 'WCV'])
elbowDF.plot.line(x='k', y='WCV')
<AxesSubplot:xlabel='k'>
We can compare the z-score normalized metrics on each of the countries. Here we can see that Switzerland is unusual in that while Germany and France have high values in most metrics including trials, Switzerland has low values in nearly everything except trials. Additionally the Netherlands and Belgium (part of a singly polity for most of the period where data was collected), England, and Italy both have low trials, but high other metrics.
panelDF_norm = panelDF.copy()
panelDF_norm.index = panelDF_norm.country
panelDF_norm = panelDF_norm.iloc[:,2:]
panelDF_norm = (panelDF_norm - panelDF_norm.mean())/panelDF_norm.std()
panelDF_norm.plot.bar(figsize=(16,8))
<AxesSubplot:xlabel='country'>
We can annotate our cluster labels on countries and view this on lower dimensional data by several different plots. Population and trials seems to best capture the different clusters.
# trying to find clusters
km = cl.KMeans(n_clusters=6)
# labels = km.fit_predict(panelDF.iloc[:,2:])
labels = km.fit_predict(panelDF_norm)
panelDF_norm['Cluster'] = labels
panelDF_norm.index = panelDF.abbrev
# trials and battles
tb_plot = panelDF_norm.plot.scatter(x='battles', y='trials', c='Cluster', cmap='viridis', figsize=(16,8), s=100, sharex=False)
for k,v in panelDF_norm.iterrows():
tb_plot.text(v.battles-0.08, v.trials-0.14, k)
# trials and urbanization
km = cl.KMeans(n_clusters=6)
labels = km.fit_predict(panelDF_norm)
panelDF_norm['Cluster'] = labels
panelDF_norm.index = panelDF.abbrev
# trials and battles
tb_plot = panelDF_norm.plot.scatter(x='urbanization', y='trials', c='Cluster', cmap='viridis', figsize=(16,8), s=100, sharex=False)
for k,v in panelDF_norm.iterrows():
tb_plot.text(v.urbanization-0.1, v.trials-0.16, k)
# trials and urbanization
km = cl.KMeans(n_clusters=6)
labels = km.fit_predict(panelDF_norm)
panelDF_norm['Cluster'] = labels
panelDF_norm.index = panelDF.abbrev
# trials and battles
tb_plot = panelDF_norm.plot.scatter(x='population', y='trials', c='Cluster', cmap='viridis', figsize=(16,8), s=100, sharex=False)
for k,v in panelDF_norm.iterrows():
tb_plot.text(v.population-0.1, v.trials-0.16, k)
witchTrialsData.plot.scatter(x="weather",y="trials")
<AxesSubplot:xlabel='weather', ylabel='trials'>
The negative values of weather indicate less precipitation while the positive values mean more precipitation. In this scatter plot we can see there tends to be higher measure of trials towards the negative weather value which we could guess that maybe droughts had something to do with an increased amount of witch trials. Also just by eye balling the plots, it seems like more instances of trials in general occured when the weather had less precipitation.
witchTrialsData.plot.scatter(x="temperature",y="trials")
<AxesSubplot:xlabel='temperature', ylabel='trials'>
The negative temperature values can be attributed to colder weather while the positive temperatures are for warmer weather. In this scatterplot by eyeballing the graph, the more instance of trials tend to occur in the colder temperatures and the overall amount of trials in general occured during colder temperatures.
witchTrialsData.plot.scatter(x="temperature",y="trials.mil")
<AxesSubplot:xlabel='temperature', ylabel='trials.mil'>
Just like with the trials, the scatterplot for trials per million also tends to showa higher measure of trials towards the colder temperatures . The highest measure occured near 0 while most of the other higher measure of trials occured in the negative temperature values. It also seems like more trials.mil took place in the colder temperature values.
witchTrialsData.plot.scatter(x="weather",y="trials.mil")
<AxesSubplot:xlabel='weather', ylabel='trials.mil'>
With the trials.mil plotted along with weather, we notice that the highest point is around the 0 value for weather and more of the higher measure of trials.mil take place along the negative values for weather . We also notice the overall amount of trials.mil mostly take place during weather with less precipitation. This follows what the scatterplot for trials vs weather had as well.
trialsData = pd.read_csv("data/witch-trials/data/trials.csv")
trialsData.head()
| year | decade | century | tried | deaths | city | gadm.adm2 | gadm.adm1 | gadm.adm0 | lon | lat | record.source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | NaN | 1520 | 1500 | 1 | 1.0 | NaN | NaN | NaN | Estonia | NaN | NaN | Madar (1990) |
| 1 | NaN | 1530 | 1500 | 1 | 1.0 | NaN | NaN | NaN | Estonia | NaN | NaN | Madar (1990) |
| 2 | NaN | 1540 | 1500 | 5 | 5.0 | NaN | NaN | NaN | Estonia | NaN | NaN | Madar (1990) |
| 3 | NaN | 1580 | 1500 | 7 | 5.0 | NaN | NaN | NaN | Estonia | NaN | NaN | Madar (1990) |
| 4 | NaN | 1590 | 1500 | 11 | 0.0 | NaN | NaN | NaN | Estonia | NaN | NaN | Madar (1990) |
trialsDF = trialsData.dropna(subset=(['lon', 'lat']))
trialsDF
| year | decade | century | tried | deaths | city | gadm.adm2 | gadm.adm1 | gadm.adm0 | lon | lat | record.source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 539 | 1500.0 | 1500 | 1500 | 2 | NaN | Isny im Allgau | Freiburg | Baden-Wurttemberg | Germany | 10.04200 | 47.6950 | Behringer (1987) |
| 540 | 1578.0 | 1570 | 1500 | 1 | NaN | Isny im Allgau | Freiburg | Baden-Wurttemberg | Germany | 10.04200 | 47.6950 | Behringer (1987) |
| 541 | 1590.0 | 1590 | 1500 | 6 | 6.0 | Erolzheim | Freiburg | Baden-Wurttemberg | Germany | 10.07080 | 48.0917 | Behringer (1987) |
| 542 | 1630.0 | 1630 | 1600 | 1 | NaN | Heidelberg | Freiburg | Baden-Wurttemberg | Germany | 8.67243 | 49.3988 | Behringer (1987) |
| 543 | 1640.0 | 1640 | 1600 | 1 | NaN | Heidelberg | Freiburg | Baden-Wurttemberg | Germany | 8.67243 | 49.3988 | Behringer (1987) |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 10641 | 1643.0 | 1640 | 1600 | 1 | 0.0 | Bruges | West-Vlaanderen | Vlaanderen | Belgium | 3.22470 | 51.2094 | Vanysacker (1988) |
| 10642 | 1643.0 | 1640 | 1600 | 1 | 0.0 | Bruges | West-Vlaanderen | Vlaanderen | Belgium | 3.22470 | 51.2094 | Vanysacker (1988) |
| 10643 | 1649.0 | 1640 | 1600 | 1 | 0.0 | Bruges | West-Vlaanderen | Vlaanderen | Belgium | 3.22470 | 51.2094 | Vanysacker (1988) |
| 10644 | 1654.0 | 1650 | 1600 | 1 | 1.0 | Bruges | West-Vlaanderen | Vlaanderen | Belgium | 3.22470 | 51.2094 | Vanysacker (1988) |
| 10645 | 1657.0 | 1650 | 1600 | 1 | 0.0 | Bruges | West-Vlaanderen | Vlaanderen | Belgium | 3.22470 | 51.2094 | Vanysacker (1988) |
5137 rows × 12 columns
trialsCityGB = trialsDF.groupby(['city', 'lon', 'lat', 'gadm.adm0'])
trialsByCity = pd.DataFrame(trialsCityGB.tried.sum().reset_index())
trialsByCity
| city | lon | lat | gadm.adm0 | tried | |
|---|---|---|---|---|---|
| 0 | Aalen | 10.097100 | 48.836700 | Germany | 1 |
| 1 | Abensberg | 11.850800 | 48.817700 | Germany | 69 |
| 2 | Achern | 8.074920 | 48.626600 | Germany | 2 |
| 3 | Agen | 0.616363 | 44.203142 | France | 6 |
| 4 | Aichach | 11.136000 | 48.457700 | Germany | 23 |
| ... | ... | ... | ... | ... | ... |
| 783 | Zusmarshausen | 10.599500 | 48.398300 | Germany | 15 |
| 784 | Zutphen | 6.196058 | 52.142736 | Netherlands | 4 |
| 785 | Zwickau | 12.473372 | 50.710217 | Germany | 1 |
| 786 | Zwiesel | 13.237900 | 49.017800 | Germany | 3 |
| 787 | kotz | 10.283300 | 48.416700 | Germany | 4 |
788 rows × 5 columns
%conda install plotly
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: C:\Users\Anandhan\anaconda3
added / updated specs:
- plotly
The following packages will be downloaded:
package | build
---------------------------|-----------------
conda-4.10.3 | py38haa95532_0 2.9 MB
plotly-5.1.0 | pyhd3eb1b0_0 2.7 MB
tenacity-8.0.1 | py38haa95532_0 39 KB
------------------------------------------------------------
Total: 5.6 MB
The following NEW packages will be INSTALLED:
plotly pkgs/main/noarch::plotly-5.1.0-pyhd3eb1b0_0
tenacity pkgs/main/win-64::tenacity-8.0.1-py38haa95532_0
The following packages will be UPDATED:
conda 4.10.1-py38haa95532_1 --> 4.10.3-py38haa95532_0
Downloading and Extracting Packages
tenacity-8.0.1 | 39 KB | | 0%
tenacity-8.0.1 | 39 KB | ########## | 100%
tenacity-8.0.1 | 39 KB | ########## | 100%
plotly-5.1.0 | 2.7 MB | | 0%
plotly-5.1.0 | 2.7 MB | #####9 | 60%
plotly-5.1.0 | 2.7 MB | ########## | 100%
plotly-5.1.0 | 2.7 MB | ########## | 100%
conda-4.10.3 | 2.9 MB | | 0%
conda-4.10.3 | 2.9 MB | ######5 | 65%
conda-4.10.3 | 2.9 MB | ########## | 100%
conda-4.10.3 | 2.9 MB | ########## | 100%
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Note: you may need to restart the kernel to use updated packages.
import plotly.graph_objects as go
trialsByCity.loc[:,'text'] = trialsByCity['city'] + ', ' + trialsByCity['gadm.adm0']
fig = go.Figure(data = go.Scattergeo(
lon = trialsByCity['lon'],
lat = trialsByCity['lat'],
text = trialsByCity['text'],
mode = 'markers',
marker = {
'opacity':0.5,
'color':trialsByCity['tried'],
'colorscale':'rainbow',
'colorbar_title':'Tried Count'
}
))
fig.update_layout(
title = 'European Witch Trials By City',
geo_scope='europe'
)
fig.show()
The data is provided as an Access 97 database. To make this more usuable, it was converted to CSV with a script.
#!/usr/bin/env python3
# From https://stackoverflow.com/questions/53687786/how-can-i-convert-an-ms-access-database-mdb-file-to-an-sqlite
import pandas_access as mdb
import pandas as pd
import numpy as np
from sqlalchemy import create_engine
import sys
import os
if len(sys.argv)!=3:
print("{0} <MDB File> <Sqlite3 File>".format(sys.argv[0]))
sys.exit(-1)
if os.path.isfile(sys.argv[2]):
print("Refusing to modify existing database!")
sys.exit(-1)
engine = create_engine('sqlite:///{0}'.format(sys.argv[2]), echo=False)
tlist = [tbl for tbl in mdb.list_tables(sys.argv[1])]
tables = {}
for tbl in tlist:
print(f"loading {tbl}")
if tbl == "WDB_Accused_family":
# There is a Est_Year_of_Birth column which is completely empty, it causes problems
tables[tbl] = mdb.read_table(sys.argv[1], tbl, dtype={
'Age': pd.Int32Dtype()
})
elif tbl == "WDB_Ref_Parish":
# There is a Est_Year_of_Birth column which is completely empty, it causes problems
tables[tbl] = mdb.read_table(sys.argv[1], tbl, dtype={
'PFID': pd.Int32Dtype()
})
elif tbl == "WDB_Trial":
# There is a Est_Year_of_Birth column which is completely empty, it causes problems
tables[tbl] = mdb.read_table(sys.argv[1], tbl, dtype={
'Female_accusers': pd.Int32Dtype(),
'Male_accusers': pd.Int32Dtype()
})
else:
tables[tbl] = mdb.read_table(sys.argv[1], tbl)
#tables[tbl] = mdb.read_table(sys.argv[1], tbl)
for k in tables:
tables[k].to_sql(k, con=engine)
Each table was exported to its own CSV file. These were then imported into pandas as merged together.
import pandas as pd
columns_to_drop = []
accused = pd.read_csv('data/witchcraft_survey/WDB_Accused.csv')
accused = accused.drop(['AccusedSystemId', 'Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
cases = pd.read_csv('data/witchcraft_survey/WDB_Case.csv')
demonic_pacts = pd.read_csv('data/witchcraft_survey/WDB_DemonicPact.csv')
demonic_pacts = demonic_pacts.drop(['DemonicRef', 'DemonicSystemId', 'DemonicID', 'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(demonic_pacts.columns)
trials = pd.read_csv('data/witchcraft_survey/WDB_Trial.csv')
trials = trials.drop(['Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(trials.columns)
ordeals = pd.read_csv('data/witchcraft_survey/WDB_Ordeal.csv')
ordeals = ordeals.drop(['OrdealRef', 'OrdealSystemId', 'OrdealID', 'Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(ordeals.columns)
tortures = pd.read_csv('data/witchcraft_survey/WDB_Torture.csv')
tortures = tortures.drop(['TortureRef', 'TortureSystemId', 'TortureID', 'Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(tortures.columns)
weather = pd.read_csv('data/witchcraft_survey/WDB_WeatherModification.csv')
weather = weather.drop(['WeatherModificationRef', 'WeatherModificationSystemId',
'WeatherModificationID', 'Createdby',
'Createdate', 'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(weather.columns)
rmotif = pd.read_csv('data/witchcraft_survey/WDB_ReligiousMotif.csv')
rmotif = rmotif.drop(['MotifRef', 'MotifSystemId', 'MotifID',
'Createdby', 'Createdate', 'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(rmotif.columns)
wmagic = pd.read_csv('data/witchcraft_survey/WDB_WhiteMagic.csv')
wmagic = wmagic.drop(['WhiteMagicRef', 'WhiteMagicSystemId', 'WhiteMagicID',
'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(wmagic.columns)
wmeet = pd.read_csv('data/witchcraft_survey/WDB_WitchesMeetingPlace.csv')
wmeet = wmeet.drop(['MeetingPlaceRef', 'MeetingPlaceSystemId', 'MeetingPlaceID',
'Createdby',
'Createdate', 'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(wmeet.columns)
schange = pd.read_csv('data/witchcraft_survey/WDB_ShapeChanging.csv')
schange = schange.drop(['ShapeChangingRef', 'ShapeChangingSystemId', 'ShapeChangingID',
'Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(schange.columns)
robject = pd.read_csv('data/witchcraft_survey/WDB_RitualObject.csv')
robject = robject.drop(['RitualObjectRef', 'RitualObjectSystemId', 'RitualObjectID',
'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(robject.columns)
pdamage = pd.read_csv('data/witchcraft_survey/WDB_PropertyDamage.csv')
pdamage = pdamage.drop(['PropertyDamageRef', 'PropertyDamageSystemId', 'PropertyDamageID',
'Createdby', 'Createdate',
'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(pdamage.columns)
music = pd.read_csv('data/witchcraft_survey/WDB_MusicalInstrument.csv')
music = music.drop(['MusicalInstrumentRef', 'MusicalInstrumentSystemId',
'MusicalInstrumentID', 'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(music.columns)
malice = pd.read_csv('data/witchcraft_survey/WDB_Malice.csv')
malice = malice.drop(['MaliceRef', 'MaliceSystemId', 'MaliceID',
'Createdby', 'Createdate', 'Lastupdatedby', 'Lastupdatedon'], axis=1)
# print(malice.columns)
fairy = pd.read_csv('data/witchcraft_survey/WDB_Elf_FairyElements.csv')
fairy = fairy.drop(['ElfFairyRef', 'ElfFairySystemId', 'ElfFairyID',
'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(fairy.columns)
devil = pd.read_csv('data/witchcraft_survey/WDB_DevilAppearance.csv')
devil = devil.drop(['DevilRef', 'DevilSystemId', 'DevilID',
'Createdby', 'Createdate', 'Lastupdatedby',
'Lastupdatedon'], axis=1)
# print(devil.columns)
wcdf = cases.merge(accused, on='AccusedRef', how='left', suffixes=['_c', '_w'])
wcdf = wcdf.merge(trials, on='CaseRef', how='left', suffixes=['_c', '_t'])
wcdf = wcdf.merge(demonic_pacts, on='CaseRef', how='left', suffixes=['_c', '_d'])
wcdf = wcdf.merge(ordeals, on='Trialref', how='left', suffixes=['_c', '_o'])
wcdf = wcdf.merge(tortures, on='Trialref', how='left', suffixes=['_c', '_o'])
wcdf = wcdf.merge(weather, on='CaseRef', how='left', suffixes=['_c', '_w'])
wcdf = wcdf.merge(rmotif, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(wmagic, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(wmeet, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(schange, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(robject, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(pdamage, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(music, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(malice, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(fairy, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.merge(devil, on='CaseRef', how='left', suffixes=['_c', '_m'])
wcdf = wcdf.drop(['CaseRef', 'Trialref', 'CaseId', 'CaseSystemId'], axis=1)
wcdf
| NamedIndividual | AccusedRef | CaseStart_date | CaseStart_date_as_date | Case_date | Case_date_as_date | Age_at_case | CaseCommonName | Commission | Complaint | ... | ShapeChanging_Type | Details | RitualObject_Type | PropertyDamage_Type | MusicalInstrument_Type | MusicalInstrument_Text | Causeofmalice | ElfFairy_Type | Devil_Type | Devil_Text | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.0 | NaN | 2/8/1649 | 01/00/00 00:00:00 | 2/8/1649 | 01/00/00 00:00:00 | NaN | 3 unnamed witches | commission for execution | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | -1.0 | A/EGD/89 | 27/1/1591 | 01/00/00 00:00:00 | 8/5/1591 | 01/00/00 00:00:00 | NaN | Kerington,Cristine | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | -1.0 | A/EGD/989 | 6/5/1627 | 01/00/00 00:00:00 | 6/5/1627 | 01/00/00 00:00:00 | NaN | Craig,Christiane | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | -1.0 | A/EGD/990 | NaN | NaN | 25/4/1627 | 01/00/00 00:00:00 | NaN | Ronaldson,Margaret | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | -1.0 | A/EGD/991 | NaN | NaN | 3/5/1627 | 01/00/00 00:00:00 | NaN | Broun,Bessie | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 392979 | -1.0 | A/LA/3240 | NaN | NaN | 4/1568 | 01/00/00 00:00:00 | NaN | Johnnestoun,Cristeane | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 392980 | -1.0 | A/LA/3241 | NaN | NaN | 4/1568 | 01/00/00 00:00:00 | NaN | Curchan,Jonet | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 392981 | -1.0 | A/LA/3242 | NaN | NaN | 4/1568 | 01/00/00 00:00:00 | NaN | Chalmer,James | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 392982 | -1.0 | A/LA/3243 | NaN | NaN | 4/1568 | 01/00/00 00:00:00 | NaN | Campbell,Catherine | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 392983 | -1.0 | A/LA/3244 | NaN | NaN | 4/1568 | 01/00/00 00:00:00 | NaN | Jak,Cristeane | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
392984 rows × 218 columns
columns_of_interest = ['Demonic_Type', 'Ordealtype', 'Torturetype', 'WeatherModification_Type',
'Motif_Type', 'WhiteMagic_Type', 'MeetingPlace_Place', 'ShapeChanging_Type',
'RitualObject_Type', 'PropertyDamage_Type', 'MusicalInstrument_Type', 'Causeofmalice',
'ElfFairy_Type', 'Devil_Type' ]
# variable_name, variable_value, execution_percent
execution_rate = []
execution_count = []
variable_name = []
variable_value = []
for c in columns_of_interest:
plot_df = wcdf[['AccusedRef', 'Execution', c]].dropna(subset=['Execution'])
for value in plot_df[c].value_counts().index:
variable_name.append(c)
variable_value.append(value)
execution_count.append(plot_df[plot_df[c] == value].groupby('AccusedRef').count()['Execution'].shape[0])
execution_rate.append(plot_df[plot_df[c] == value].groupby('AccusedRef').max()['Execution'].mean())
long_df = pd.DataFrame({'execution_rate': execution_rate, 'execution_count': execution_count, 'variable_name': variable_name, 'variable_value': variable_value})
long_df
| execution_rate | execution_count | variable_name | variable_value | |
|---|---|---|---|---|
| 0 | 0.326797 | 153 | Demonic_Type | Devil's Mark |
| 1 | 0.283721 | 215 | Demonic_Type | Anti-baptism |
| 2 | 0.281250 | 128 | Demonic_Type | Sex |
| 3 | 0.284211 | 95 | Demonic_Type | New name |
| 4 | 0.382353 | 68 | Demonic_Type | Head and foot |
| ... | ... | ... | ... | ... |
| 327 | 0.250000 | 4 | Devil_Type | Fairy |
| 328 | 1.000000 | 2 | Devil_Type | Human Devil |
| 329 | 0.000000 | 1 | Devil_Type | Insect Devil |
| 330 | 1.000000 | 1 | Devil_Type | Child Devil |
| 331 | 1.000000 | 1 | Devil_Type | Baby |
332 rows × 4 columns
for variable in long_df.variable_name.value_counts().index:
var_df = long_df[long_df.variable_name == variable]
rc = var_df.execution_count/var_df.execution_count.sum()
long_df.loc[long_df.variable_name == variable, 'relative_count'] = rc
long_df
| execution_rate | execution_count | variable_name | variable_value | relative_count | |
|---|---|---|---|---|---|
| 0 | 0.326797 | 153 | Demonic_Type | Devil's Mark | 0.162076 |
| 1 | 0.283721 | 215 | Demonic_Type | Anti-baptism | 0.227754 |
| 2 | 0.281250 | 128 | Demonic_Type | Sex | 0.135593 |
| 3 | 0.284211 | 95 | Demonic_Type | New name | 0.100636 |
| 4 | 0.382353 | 68 | Demonic_Type | Head and foot | 0.072034 |
| ... | ... | ... | ... | ... | ... |
| 327 | 0.250000 | 4 | Devil_Type | Fairy | 0.012539 |
| 328 | 1.000000 | 2 | Devil_Type | Human Devil | 0.006270 |
| 329 | 0.000000 | 1 | Devil_Type | Insect Devil | 0.003135 |
| 330 | 1.000000 | 1 | Devil_Type | Child Devil | 0.003135 |
| 331 | 1.000000 | 1 | Devil_Type | Baby | 0.003135 |
332 rows × 5 columns
There were various demonic pacts, ordeals (tests), and tortures captured in the dataset. Surprisingly sleep deprivation is the most common, though this is rarely seen in popular media.
wcdf.Torturetype.value_counts().plot.bar()
<AxesSubplot:>
Though ducking is a famous type of witch ordeal (even features in Monty Python's Holy Grail), at least in Scottish witchcraft trials it is only recorded once.
wcdf.Ordealtype.value_counts().plot.bar()
wcdf.Ordealtype.value_counts()
Pricking 188 Searching 45 Bierricht (corpse bleeds) 16 Ducking 1 Name: Ordealtype, dtype: int64
wcdf.Demonic_Type.value_counts().plot.bar()
<AxesSubplot:>
In order to start examining the relation between different ordeals and tortures they were plotted against each other. Unfortunately many cases didn't have data, so there's limited information we can draw. It does seem that ordeals and torture were probably highly localized because while pricking seems to correspond with the prevelance of torture types, searching for Devil's marks is only found with irons torture.
ord_tort = wcdf.groupby(['Ordealtype', 'Torturetype']).size().reset_index()
ord_tort['Size'] = ord_tort.loc[:,0] * 100
ord_tort.plot.scatter(x='Ordealtype', y='Torturetype', s='Size', figsize=(5, 6))
<AxesSubplot:xlabel='Ordealtype', ylabel='Torturetype'>
While the verdict of all trials isn't recorded, we do have over 500 results. Additionally
wcdf.Verdict.value_counts()
Guilty 477 Not Guilty 60 Not Proven 11 Half Guilty 7 Name: Verdict, dtype: int64
We can also look at socioeconomic status vs. the verdict of the trial. Note that all of the of the Lairds/Baron and Nobility/Chiefs were found Not Guilty or Not Proven. All other categories had a high ratio of Guilty to other categories. This includes the upper class, which was not as protected as the nobility. Though most of the accused were middle class or lower.
wcdf.SocioecStatus.value_counts()
ord_tort = wcdf.groupby(['SocioecStatus', 'Verdict']).size().reset_index()
ord_tort['Size'] = ord_tort.loc[:,0] * 100
ord_tort.plot.scatter(x='SocioecStatus', y='Verdict', s='Size', figsize=(12, 8))
plt.xticks(rotation=90);
wcdf.SocioecStatus.value_counts().plot.bar(figsize=(12,7))
<AxesSubplot:>
If we look at the number of male and female accusers we can see that female accusers are far more common, especially in high numbers than male accusers. Additionally male accusers seem less associated with guilty convictions than female accusers. Most of the non-guilty convictions occur in instances when there are a high number of male accusers, but few female accusers.
import numpy as np
import seaborn as sns
wcdf.Male_accusers.value_counts()
wcdf_acc = wcdf[(wcdf.Male_accusers != 0) & (wcdf.Female_accusers != 0)]
sns.set(rc = {'figure.figsize':(8,16)})
sns.stripplot(data=wcdf_acc, x='Female_accusers', y='Male_accusers', hue='Verdict', s=10, alpha=0.4)#, x_jitter=True, y_jitter=True)
plt.axis('scaled')
(-0.9462945016328732, 18.819972039434194, -1.1, 45.1)
import seaborn as sns
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable
data = long_df[long_df.execution_count >= 10]
for variable in data.variable_name.value_counts().index:
cmap = sns.color_palette("flare", as_cmap=True)
variable_data = data[data.variable_name == variable]
ax = variable_data.plot(kind='bar', sharex=False, x='variable_value', y="execution_rate", color=cmap(variable_data.relative_count), figsize=(12,7))
ax.set_xlabel(variable)
ax.set_ylim((0,1))
for rect, label in zip(ax.patches, variable_data.execution_count):
height = rect.get_height()
ax.text(
rect.get_x() + rect.get_width() / 2, height + 0.005, label, ha="center", va="bottom"
)
Based on our analysis we came up with the following answers to our research questions:
How did witchcraft trials change over time and geography?
In our various plots, we looked at how the amount of trials changed over many centuries. From these plots, we can see that the highest peaks of witchcraft trials occurred during the 17th century, with the trials mostly occurring in Germany, Switzerland, Scotland, and Spain.
How did population and urbanization affect witchcraft trials?
To find whether or not population and urbanization had an affect on witchcraft trials, we used KMeans clustering to see if countries with similar population and urbanization statistics had a comparable number of trials. In the clustering plots there wasn't any significant evidence to suggest that population and urbanization affected witchcraft trials.
What factors were most likely to result in an execution?
Using the Scottish witchcraft survey data identified a variety of variables that impacts execution rates and showed the relation between each of these. Most notably torture was very unlikely to result in execution and were ordeals. The involvements of storms as evidence in the trial however was likely to result in a conviction. Further more we looked at socioeconomic factors and the gender and count of the accusers.
Did climate or weather conditions drive witchcraft trials?
From the four scatterplots detailing, trials vs weather, trials vs temperature, trials.mil vs weather, and trials.mil vs temperature we observed that a higher measure of trials occured in weather with negative values which represented less precipitation and a higher measure of trials occured in colder temperatures. We also observed a higher frequency of trials and trials.mil occured in colder temperatures and in weather with less precipitation. So from this we insinuated that perhaps the witchcraft trials occured during weather that brought drought and in temperatures creating cold enough temperatures that may have destroyed crops and hence food supplies. We know from the Scottish witchcraft survey dataset that storms in witchcraft trials were likely (~75%) to result in an execution, so there is a relationship between witchcraft trials and the weather. With the correlation found among cold temperatures and trials, detrimental effects of cold weather may have been blamed on witch